Question responding apparatus, question responding method and program

ABSTRACT

This disclosure is provided, in which an answer generation unit configured to receive a document and a question as inputs, and execute processing of generating an answer sentence for the question by a learned model by using a word included in a union of a predetermined first vocabulary and a second vocabulary composed of words included in the document and the question, in which the learned model includes a learned neural network that has been learned in advance whether word included in the answer sentence is included in the second vocabulary, and increases or decreases a probability at which a word included in the second vocabulary is selected as the word included in the answer sentence at the time of generating the answer sentence by the learned neural network.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a U.S. 371 Application of International PatentApplication No. PCT/JP2019/013069, filed on 27 Mar. 2019, whichapplication claims priority to and the benefit of JP Application No.2018-082521, filed on 23 Apr. 2018, the disclosures of which are herebyincorporated herein by reference in their entireties.

TECHNICAL FIELD

The present disclosure relates to a question answering apparatus, aquestion answering method, and a program.

BACKGROUND ART

If “reading comprehension” for generating an answer sentence for a givendocument and question can be accurately performed by artificialintelligence, it can be applied to a wide range of services such as aquestion answering and an intelligent agent.

As a technology in related art for reading comprehension, there is atechnology disclosed in, for example, Non Patent Literature 1. In thetechnology in related art disclosed in Non Patent Literature 1 and thelike, for example, a word sequence of a document and a question isencoded (vectorized), and a vector expression of each word sequence ismatched, and then, an answer is generated based on a content of thedocument.

CITATION LIST Non Patent Literature

-   Non Patent Literature 1: Chuanqi Tan, Furu Wei, Nan Yang, Weifeng    Lv, Ming Zhou: S-Net: From Answer Extraction to Answer Generation    for Machine Reading Comprehension. CoRR abs/1706.04815 (2017)

SUMMARY OF THE INVENTION Technical Problem

On the other hand, in reading comprehension, words included in answersare often included in questions and documents. However, in thetechnology in related art disclosed in Non Patent Literature 1 and thelike, an answer is generated from words included in a specificvocabulary (for example, a vocabulary composed of words frequentlyappearing in a general document). Accordingly, if a word that is notincluded in the vocabulary (for example, a word such as a proper noun ora technical term) exists in the document, the word is treated as anunknown word, and a highly accurate answer sentence may not be obtained.

An embodiment of the present disclosure has been made in view of theabove points, and has an object to provide a highly accurate answer to aquestion.

Means for Solving the Problem

To achieve the above object, an embodiment of the present inventionincludes an answer generation unit configured to receive a document anda question as inputs, and execute processing of generating an answersentence for the question by a learned model by using a word included ina union of a predetermined first vocabulary and a second vocabularycomposed of words included in the document and the question, in whichthe learned model includes a learned neural network that has beenlearned in advance whether a word included in the answer sentence isincluded in the second vocabulary, and increases or decreases aprobability at which a word included in the second vocabulary isselected as the word included in the answer sentence at the time ofgenerating the answer sentence by the learned neural network.

Effects of the Invention

A highly accurate answer to a question can be realized.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram illustrating an example of a functionalconfiguration of a question answering apparatus at the time of answeringto a question in an embodiment of the present disclosure.

FIG. 2 is a diagram illustrating an example of a functionalconfiguration of the question answering apparatus at the time oflearning according to the embodiment of the present disclosure.

FIG. 3 is a diagram illustrating an example of data stored in a wordvector storage unit.

FIG. 4 is a diagram illustrating an example of a hardware configurationof a question answering apparatus according to the embodiment of thepresent disclosure.

FIG. 5 is a flowchart illustrating an example of learning processingaccording to the embodiment of the present disclosure.

FIG. 6 is a flowchart illustrating an example of parameter updateprocessing according to the embodiment of the present disclosure.

FIG. 7 is a flowchart illustrating an example of question answeringprocessing according to the embodiment of the present disclosure.

DESCRIPTION OF EMBODIMENTS

An embodiment of the present disclosure will be described below withreference to the drawings. Hereinafter, a description will be given of aquestion answering apparatus 100 that provides a highly accurate answerto a question by accurately including words included in a given documentand question in an answer sentence.

The embodiment described below is merely an example, and the embodimentto which the present disclosure is applied is not limited to thefollowing embodiment. For example, a technology according to theembodiment of the present disclosure can be used for question answeringrelated to a specialized document, but is not limited to this, and canbe used for various cases.

Overview

In the embodiment of the present disclosure, when an arbitrary documentand an arbitrary question sentence (hereinafter, simply referred to as“question”) for the document are given, the question answering apparatus100 uses a sentence generation technology by a neural network togenerate a word string that is an answer sentence for the document andthe question. At this time, according to the embodiment of the presentdisclosure, not only a specific vocabulary (for example, a vocabularycomposed of words frequently appearing in a general document, which is avocabulary V described later) but also a vocabulary (vocabulary Bdescribed later) composed of words included in the document and thequestion that are given to the question answering apparatus 100 is usedto generate an answer sentence to the question. By outputting the answersentence, question answering to the given document and question isperformed.

More specifically, in the embodiment of the present disclosure, whenwords included in an answer sentence are generated by a neural network,an appearance probability of a word included in a document and aquestion is increased or decreased. Accordingly, in the embodiment ofthe present disclosure, the word included in the document and thequestion can be accurately included in the answer sentence.

Additionally, in the embodiment of the present disclosure, in order togenerate the answer sentence mentioned above, a neural network thatidentifies whether the word included in an answer sentence is includedin the given document or question is learned.

Here, in the embodiment of the present disclosure, a vocabulary composedof words frequently appearing in a general document is represented as“V”, and a vocabulary composed of words appearing in the document andthe question given to the question answering apparatus 100 isrepresented as “B”. Additionally, a vocabulary represented by a union ofthe vocabulary V and a vocabulary B is represented as “V′”.

The vocabulary V can be constituted with, for example, a set of wordsthat are tens of thousands to hundreds of thousands of words with tophigh frequencies of occurrence, among words appearing in a generaldocument set such as a large text set. Additionally, the vocabulary Bcan be constituted with, for example, a set of words appearing in thedocument and the question given to the question answering apparatus 100.Note that the vocabulary V′ also includes special words (for example,<s> and </s>) representing the beginning or the end of a document.

Functional Configuration of Question Answering Apparatus 100

First, a functional configuration of the question answering apparatus100 at the time of question answering in the embodiment of the presentdisclosure will be described with reference to FIG. 1 . FIG. 1 is adiagram illustrating an example of a functional configuration of thequestion answering apparatus 100 at the time of question answering in anembodiment of the present disclosure.

As illustrated in FIG. 1 , the question answering apparatus 100 at thetime of question answering includes a word vector storage unit 101, aninput unit 102, a word sequence coding unit 103, a word sequencematching unit 104, a document gaze unit 105, a question gaze unit 106,and an answer generation unit 107, and an output unit 108.

Additionally, a functional configuration of the question answeringapparatus 100 at the time of learning according to the embodiment of thepresent disclosure will be described with reference to FIG. 2 . FIG. 2is a diagram illustrating an example of a functional configuration ofthe question answering apparatus 100 at the time of learning accordingto the embodiment of the present disclosure.

As illustrated in FIG. 2 , the question answering apparatus 100 at thetime of learning differs from the function configuration of the questionanswering apparatus 100 at the time of question answering in that aparameter update unit 109 is included without the output unit 108. Otherfunctional configurations of the question answering apparatus 100 at thetime of learning are the same as the functional configurations of thequestion answering apparatus 100 at the time of question answering.However, the question answering apparatus 100 at the time of learningmay include the output unit 108. That is, the question answeringapparatus 100 at the time of learning may have a functionalconfiguration obtained by adding the parameter update unit 109 to thefunctional configuration of the question answering apparatus 100 at thetime of question answering.

The word vector storage unit 101 stores a set of a word and a wordvector representing the word as a vector. Hereinafter, it is assumedthat the number of dimensions of the word vector is E dimensions. A setof a word and a word vector can be generated by, for example, a methoddisclosed in Reference 1 below. Note that E may be set as, for example,E=300.

Reference 1

-   Thomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey    Dean. Distributed Representations of Words and Phrases and their    Compositionality. In Proceedings of NIPS, 2013.    Here, an example of data (a set of a word and a word vector) stored    in the word vector storage unit 101 is illustrated in FIG. 3 . FIG.    3 is a diagram illustrating an example of data stored in the word    vector storage unit 101. As illustrated in FIG. 3 , in the word    vector storage unit 101, for example, a word such as “go”, “write”,    and “baseball” is associated with a word vector representing the    word as a vector.

It is assumed that a word vector of a word not stored in the word vectorstorage unit 101 is an E-dimensional zero vector. Additionally, it isassumed that a word vector of a special word “PAD” used for padding whencreating a vector sequence of a predetermined length is a zero vectorwith E dimensions.

The input unit 102 inputs the document and the question given at thetime of question answering. Additionally, at the time of learning, theinput unit 102 inputs a training data set (that is, a set (data set)constituted with training data, which is a set of a document, aquestion, and a correct answer sentence).

The word sequence coding unit 103 converts a sequence of each word(hereinafter, referred to as a “word sequence”) included in the documentinput by the input unit 102 into a vector sequence (hereinafter,referred to as a “first document vector sequence”) in which each wordconstituting the word sequence is represented by a vector, respectively.Additionally, the word sequence coding unit 103 converts the firstdocument vector sequence into a second document vector sequence using anencoder based on a neural network.

The word sequence coding unit 103 converts a word sequence included inthe question input by the input unit 102 into a vector sequence(hereinafter, referred to as a “first question vector sequence”) inwhich each word constituting the word sequence is represented by avector, respectively. Additionally, the word sequence coding unit 103converts the first question vector sequence into a second questionvector sequence using an encoder based on a neural network.

The word sequence matching unit 104 calculates a matching matrix for thesecond document vector sequence and the second question vector sequenceobtained by the word sequence coding unit 103.

The document gaze unit 105 calculates a gaze of each word included inthe document by using the matching matrix calculated by the wordsequence matching unit 104.

The question gaze unit 106 calculates a gaze of each word included inthe question by using the matching matrix calculated by the wordsequence matching unit 104.

The answer generation unit 107 uses a decoder in accordance with aneural network, a gaze distribution of each word included in thedocument calculated by the document gaze unit 105 (hereinafter, referredto as “gaze distribution”), a gaze distribution of each word included inthe question calculated by the question gaze unit 106, and a probabilitydistribution of a word included in the vocabulary V to calculate a scoreat which each word included in the vocabulary V′ is selected as a wordincluded in the answer sentence. Then, the answer generation unit 107selects each word included in the answer sentence from the vocabulary V′using the score calculated at the time of question answering.Accordingly, the answer sentence is generated.

The output unit 108 outputs the answer sentence generated by the answergeneration unit 107 at the time of question answering. An outputdestination of the answer sentence is not limited. The outputdestination of the answer sentence includes, for example, a displaydevice such as a display, an auxiliary storage device such as a HardDisk Drive (HDD), a voice output apparatus such as a speaker, and otherapparatuses connected via a communication network.

At the time of learning, the parameter update unit 109 calculates a lossusing a probability distribution indicating the score calculated by theanswer generation unit 107 and the correct answer sentence input by theinput unit 102. Then, the parameter update unit 109 uses the loss toupdate a parameter by an arbitrary optimization method. Accordingly, aneural network for generating the word included in the answer sentenceis learned.

After the parameter is updated using the training data set, the questionanswering apparatus 100 may use a document and a question not includedin the training data set (data indicating the document and the questionnot included in the training data set is also referred to as “testdata”) to perform evaluation for a recognition accuracy of the learnedneural network (i.e., evaluation related to an accuracy of the answersentence generated by the answer generation unit 107).

Hardware Configuration of Question Answering Apparatus 100

Next, a hardware configuration of the question answering apparatus 100according to the embodiment of the present disclosure will be describedwith reference to FIG. 4 . FIG. 4 is a diagram illustrating an exampleof a hardware configuration of the question answering apparatus 100according to the embodiment of the present disclosure.

As illustrated in FIG. 4 , the question answering apparatus 100according to the embodiment of the present disclosure includes an inputdevice 151, a display device 152, an external I/F 153, a Random AccessMemory (RAM) 154, a Read Only Memory (ROM) 155, an operation device 156,a communication I/F 157, and an auxiliary storage device 158. Each ofthese pieces of hardware is communicably connected respectively via abus B.

The input device 151 is, for example, a keyboard, a mouse, a touch paneland the like, and is used by a user to input various operations. Thedisplay device 152 is, for example, a display, and displays a processingresult of the question answering apparatus 100 (for example, an answersentence to a question). Note that the question answering apparatus 100does not need to include at least one of the input device 151 and thedisplay device 152.

The external I/F 153 is an interface with an external apparatus. Theexternal apparatus includes a recording medium 153 a and the like. Thequestion answering apparatus 100 can read and write the recording medium153 a and the like via the external I/F 153. The recording medium 153 amay record at least one program and the like that realizes eachfunctional unit included in the question answering apparatus 100.

Examples of the recording medium 153 a include a floppy disk, a CompactDisc (CD), a Digital Versatile Disk (DVD), a Secure Digital memory card(SD memory card), and a Universal Serial Bus (USB) memory card.

The RAM 154 is a volatile semiconductor memory that temporarily stores aprogram and data. The ROM 155 is a non-volatile semiconductor memorythat can retain a program and data even when the power is turned off.The ROM 155 stores, for example, settings related to an operating system(OS) and settings related to a communication network.

The operation device 156 is, for example, a central processing unit(CPU) or a graphics processing unit (GPU), and reads a program or datafrom the ROM 155, the auxiliary storage device 158 and the like onto theRAM 154 to execute processing. Each functional unit included in thequestion answering apparatus 100 works by, for example, processing inwhich at least one program stored in the auxiliary storage device 158 isexecuted by the operation device 156. Note that the question answeringapparatus 100 may include both the CPU and the GPU as the operationdevice 156, or include only one of the CPU and the GPU.

The communication I/F 157 is an interface to connect the questionanswering apparatus 100 to a communication network. At least one programrealizing each functional unit included in the question answeringapparatus 100 may be acquired (downloaded) from a predetermined serverapparatus and the like via the communication I/F 157.

The auxiliary storage device 158 is, for example, an HDD or a solidstate drive (SSD), and is a non-volatile storage apparatus that stores aprogram and data. The programs and data stored in the auxiliary storagedevice 158 include, for example, an OS and at least one program thatrealizes each functional unit included in the question answeringapparatus 100.

The question answering apparatus 100 according to the embodiment of thepresent disclosure has the hardware configuration illustrated in FIG. 4and thus can perform various processing described below. In the exampleillustrated in FIG. 4 , a case has been described where the questionanswering apparatus 100 according to the embodiment of the presentdisclosure is achieved by one apparatus (computer), but the presentdisclosure is not limited to this. The question answering apparatus 100according to the embodiment of the present disclosure may be achieved bya plurality of apparatuses (computers).

Learning Processing

Hereinafter, learning processing executed by the question answeringapparatus 100 according to the embodiment of the present disclosure willbe described with reference to FIG. 5 . FIG. 5 is a flowchartillustrating an example of learning processing according to theembodiment of the present disclosure. As described above, the questionanswering apparatus 100 at the time of learning includes each functionalunit illustrated in FIG. 2 .

Step S101: The input unit 102 inputs a training data set. The input unit102 may, for example, input a training data set stored in the auxiliarystorage device 158, the recording medium 153 and the like, or may inputa training data set acquired (downloaded) from a predetermined serverapparatus and the like via the communication I/F 157.

Step S102: The input unit 102 initializes a number of epoch n_(e)indicating a number of times of learning for the training data set,to 1. The maximum value of the number of epoch n_(e) is assumed to be ahyper parameter, N_(e). N_(e) may be set as, for example, N_(e)=15.

Step S103: The input unit 102 divides the training data set into N_(b)mini-batches. The division number N_(b) of the mini-batches is a hyperparameter. The N_(b) may be set as, for example, N_(b)=60.

Step S104: The question answering apparatus 100 repeatedly executesparameter update processing for each of N_(b) mini-batches. That is, thequestion answering apparatus 100 calculates a loss using the mini-batch,and then updates the parameter by an arbitrary optimization method usingthe calculated loss. The details of the parameter update processing willbe described later.

Step S105: The input unit 102 judges whether the number of epoch n_(e)is greater than N_(e)−1. When the epoch number n_(e) is not judged to begreater than N_(e)−1, the question answering apparatus 100 executesprocessing of step S106. On the other hand, when the number of epochn_(e) is judged to be greater than N_(e)−1, the question answeringapparatus 100 ends learning processing.

Step S106: The input unit 102 adds “1” to the number of epoch n_(e).Then, the question answering apparatus 100 executes the processing ofstep S103. Accordingly, the processing of step S103 and step S104 isrepeatedly executed N_(e) times using the training data set input instep S101.

Parameter Update Processing

Here, the parameter update processing in step S104 will be describedwith reference to FIG. 6 . FIG. 6 is a flowchart illustrating an exampleof the parameter update processing according to the embodiment of thepresent disclosure. Hereinafter, the parameter update processing will bedescribed with reference to certain one mini-batch of N_(b)mini-batches.

Step S201: The input unit 102 acquires one training data from themini-batch. Note that the training data is a set of a document, aquestion, and a correct answer sentence (that is, data represented by(the document, the question, the correct answer sentence)). Hereinafter,when a “document”, a “question”, and a “correct answer sentence” arerepresented, it is assumed that they indicate the document, thequestion, and the correct answer sentence included in the training dataacquired in step S201, respectively.

Step S202: The word sequence coding unit 103 obtains a first documentvector sequence X and a second document vector sequence H by thefollowing step S202-1 and step S202-2.

Step S202-1: The word sequence coding unit 103 searches the word vectorstorage unit 101 for each word included in the word sequence (x₁, x₂, .. . , x_(T)) from the beginning of the document to the T-th one andconverts each word, x_(t) (t=1, 2, . . . , T) into a word vector e_(t),respectively. Then, the word sequence coding unit 103 obtains the firstdocument vector sequence X=[e₁ e₂ . . . e_(T)]∈R^(E×T) by using the wordvector e_(t) (t=1, 2, . . . , T) as a vector sequence. Accordingly, theword sequence (x₁, x₂, . . . , X_(T)) of the document is converted intothe first document vector sequence X. Here, T is a length of the wordsequence, and for example, T may be equal to 400.

When the length of the word sequence of the document is less than T,padding is performed with the special word “PAD”. On the other hand,when the length of the word sequence of the document exceeds T, the wordsequence in the excess portion is ignored.

Step S202-2: The word sequence coding unit 103 converts the firstdocument vector sequence X into the second document vector sequence of2d×T, H=[H₁, H₂, . . . , H_(T)] by using an encoder in accordance with aneural network. Here, a bidirectional Long short-term memory (LSTM)disclosed in Reference 2 below is used as the encoder, for example,where a size of a hidden state is d. Note that d may be set as, forexample, d=100.

Reference 2

-   Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory.    Neural Computation 9(8):1735-1780.

Step S203: The word sequence coding unit 103 obtains a first questionvector sequence Q and a second question vector sequence U by thefollowing step S203-1 and step S203-2.

Step S203-1: The word sequence coding unit 103 searches the word vectorstorage unit 101 for each word included in the word sequence (q₁, q₂, .. . , q_(J)) from the beginning of the document to the J-th one andconverts each word, q_(j) (j=1, 2, . . . , J) into a word vector e_(j),respectively. Then, the word sequence coding unit 103 obtains the firstquestion vector sequence Q=[e₁ e₂ . . . e_(J)]∈R^(E×J) by using the wordvector e_(j) (j=1, 2, . . . , J) as a vector sequence. Accordingly, theword sequence (q₁, q₂, . . . , q_(J)) of the question is converted intothe first question vector sequence Q. Here, J is a length of the wordsequence, and for example, T may be equal to 30.

When the length of the word sequence of the question is less than J,padding is performed with the special word “PAD”. On the other hand,when the length of the word sequence of the document exceeds J, the wordsequence in the excess portion is ignored.

Step S203-2: The word sequence coding unit 103 converts the secondquestion vector sequence Q into the second question vector sequence of2d×J, U=[U₁, U₂, . . . , U_(J)] using the encoder by the neural network.Note that the encoder uses the bidirectional LSTM in which the size ofthe hidden state is d, as in step S202-2.

Step S204: The word sequence matching unit 104 calculates the matchingmatrix M^(H) and M^(U) by the following step S204-1 to step S204-4.

Step S204-1: The word sequence matching unit 104 calculates a matchingmatrix S using the second document vector sequence H and the secondquestion vector sequence U. Each element S_(tj) of the matching matrix Sis calculated by the following equation.S _(tj) =w _(s) ^(τ) [H _(t) ;U _(j) ;H _(t) ∘U _(j) ]∈R  [Math. 1]Here, τ represents transposition, circle (∘) represents a product ofeach vector element, and semicolon (;) represents concatenation ofvectors. Additionally, w_(s)∈R^(6d) is a parameter of a neural networkto be learned (that is, a neural network functioning as an encoder).

Step S204-2: The word sequence matching unit 104 uses the seconddocument vector sequence H, the second question vector sequence U, andthe matching matrix S to calculate attention-weighted average vectorsU_(j) ^(˜) and H_(t) ^(˜). For convenience of the description, in thetext of the disclosure, “X with a wavy line attached to the top” isdesignated as X^(˜).

The attention-weighted average vectors U_(j) ^(˜) and H_(t) ^(˜) arecalculated respectively by the corresponding one of the followingequations.

$\begin{matrix}\begin{matrix}{{\overset{\sim}{U}}_{j} = {{\sum\limits_{t = 1}^{T}{\alpha_{jt}^{H}H_{t}}} \in R^{2d}}} \\{{\overset{˜}{H}}_{t} = {{\sum\limits_{j = 1}^{J}{\alpha_{tj}^{U}U_{j}}} \in R^{2d}}}\end{matrix} & \left\lbrack {{Math}.2} \right\rbrack\end{matrix}$Here,α_(j) ^(H)=softmax_(t)(S _(j))∈R ^(T),α_(t) ^(U)=softmax_(j)(S _(t))∈R^(J)  [Math. 3]Additionally, S_(j) represents a column vector in the j-th column of thematching matrix S, and S_(t) represents a row vector of the t-th row ofthe matching matrix S.

Step S204-3: The word sequence matching unit 104 calculates vectorsequences G^(U) and G^(H). The vector sequences G^(U) and G^(H) iscalculated respectively by the corresponding one of the followingequations.G ^(H) =[H;{tilde over (H)};H∘{tilde over (H)}]∈R ^(6d×T)G ^(U) =[U;Ũ;U∘Ũ]∈R ^(6d×J)  [Math. 4]Step S204-4: The word sequence matching unit 104 converts the vectorsequences G^(U) and G^(H) into matching matrixes M^(H)∈R^(2d×T) andM^(U)∈R^(2d×J), respectively, by the bidirectional LSTM of a singlelayer with the size of the hidden state as d.

Step S205: The answer generation unit 107 initializes the index k of theword y_(k) included in the answer sentence to k=1 and initializes theinitial state s₀∈R^(2d) of the decoder by the neural network with a zerovector. Additionally, the answer generation unit 107 sets the 0-th wordy₀ that is included in the answer sentence, with a special word <s>which represents the beginning of a sentence. Here, as the decoder, aneural network such as recurrent neural network (RNN) or LSTM which is atype of RNN is used. Hereinafter, the word y_(k) included in the answersentence is also represented as the “output word y_(k)”.

Step S206: The answer generation unit 107 searches the word vectorstorage unit 101 for the output word y_(k-1), and converts the outputword y_(k-1) into the word vector z_(k-1) of E-dimension.

Step S207: The document gaze unit 105 uses the state s_(k-1) of thedecoder to calculate a gaze c_(k) ^(H) of each word included in thedocument by the following equations.

$\begin{matrix}\begin{matrix}{\nu_{kt} = {{F\left( {M_{t}^{H},s_{k - 1}} \right)} \in R}} \\{\beta_{kt}^{H} = \frac{v_{kt}}{\sum_{t^{\prime} = 1}^{\tau}\nu_{{kt}^{\prime}}}} \\{c_{k}^{H} = {{\overset{T}{\sum\limits_{t = 1}}{\beta_{kt}^{H}M_{t}^{H}}} \in R^{2d}}}\end{matrix} & \left\lbrack {{Math}.5} \right\rbrack\end{matrix}$Here, a score function F uses an inner product (M_(t) ^(H)·s_(k-1)).Note that, as the score function F, a biliner, a multilayer perceptronand the like may be used.

Step S208: The question gaze unit 106 uses the state s_(k-1) of thedecoder to calculate a gaze c_(k) ^(U) of each word included in thequestion by the following equations.

$\begin{matrix}{{v_{kj} = {{F\left( {M_{t}^{U},\ s_{k - 1}} \right)} \in R}}{\beta_{kj}^{U} = \frac{v_{kj}}{\sum_{j^{\prime} = 1}^{J}v_{{kj}^{\prime}}}}{c_{k}^{U} = {{\underset{j = 1}{\sum\limits^{J}}{\beta_{kj}^{U}M_{j}^{U}}} \in R^{2d}}}} & \left\lbrack {{Math}.6} \right\rbrack\end{matrix}$Here, the score function F uses an inner product (M_(t) ^(U)·s_(k-1)).Note that, as the score function F, a biliner, a multilayer perceptronand the like may be used.

Step S209: The answer generation unit 107 updates the state s_(k) of thedecoder by the following equation.s _(k) =f(s _(k-1) ,[z _(k-1) ;c _(k) ^(H) ;c _(k) ^(U)])  [Math. 7]Here, a neural network such as LSTM is used as the function f of thedecoder as described above. The neural network functioning as thedecoder is to be learned. Note that the RNN other than the LSTM may beused as the decoder.

Step S210: The answer generation unit 107 calculates the document gazescore λ^(H) and the question gaze score λ^(U) respectively by thecorresponding one of the following equations.λ^(H)=sigmoid(w ^(H) ·s _(k-1))λ^(U)=sigmoid(w ^(U) ·s _(k-1))  [Math. 8]Here, w^(H)∈R^(2d) and w^(U)∈R^(2d) are parameters of the neural networkto be learned (that is, a neural network functioning as a decoder).

Step S211: The answer generation unit 107 calculates a vector o_(k)representing the score at which each word in the vocabulary V′ isselected as the output word y_(k).

Here, it is assumed the number of words included in the vocabulary V′ isrepresented as N, and the score at which the n-th word included in thevocabulary V′ is selected as the output word y_(k) is represented aso_(k, n). In this case, the vector o_(k) can be represented aso_(k)=(o_(k,1), o_(k,2), . . . , o_(k,N)).

Additionally, it is assumed that the greater the o_(k,n), the moreeasily the n-th word included in the vocabulary V′ is selected as theoutput word y_(k). At this time, by performing normalization such thateach o_(k,n) is set as 0≤o_(k,n)≤1 and the sum of each o_(k,n) is set to1, o_(k)=(o_(k,1), o_(k,2), . . . , o_(k,N)) can be represented as aprobability distribution p(y_(k)|y_(<k), X) of a conditional probabilityat which each word included in the vocabulary V′ is selected as anoutput word y_(k) when the output word is selected up to the (k−1)thone. By using the document gaze score λ^(H) and the question gaze scoreλ^(U), the probability distribution p(y_(k)|y_(<k), X) is calculated,for example, by the following equation.

$\begin{matrix}{{p\left( {y_{k}{❘{y_{< k},X}}} \right)} = {{\lambda_{\max}\left( {{\frac{\lambda^{H}}{\lambda^{U} + \lambda^{H}}{P_{C}^{H}\left( {y_{k}{❘{y_{< k},X}}} \right)}} + {\frac{\lambda^{H}}{\lambda^{U} + \lambda^{H}}{P_{C}^{H}\left( {y_{k}{❘{y_{< k},X}}} \right)}}} \right)} + {\left( {1 - \lambda_{\max}} \right){P_{G}\left( {y_{k}{❘{y_{< k},X}}} \right)}}}} & \left. \left\{ {{Math}.9} \right. \right\rbrack\end{matrix}$Here, it is assumed that λ_(max)=max(λ^(H), λ^(U)). Additionally, P_(C)^(H) which is a probability distribution based on the words included inthe document and P_(C) ^(U) which is a probability distribution based onthe words included in the question, are calculated, using a gazedistribution of the document and a gaze distribution of the question,respectively by the corresponding one of the following equations.

$\begin{matrix}\begin{matrix}{\left. {P_{C}^{H}\left( {y_{k}{❘{y_{{< k},}X}}} \right.} \right) = {\frac{1}{\sum_{t = 1}^{T}\beta_{kt}^{H}}{\sum_{t = 1}^{T}{{\beta_{kt}^{H} \cdot I}\left( {y_{k} = x_{t}} \right)}}}} \\{{P_{C}^{U}\left( {y_{k}{❘{y_{< k},X}}} \right)} = {\frac{1}{\sum_{j = 1}^{J}\beta_{kj}^{U}}{\sum_{j = 1}^{J}{\beta_{kj}^{U} \cdot {I\left( {y_{k} = q_{j}} \right)}}}}}\end{matrix} & \left\lbrack {{Math}.10} \right\rbrack\end{matrix}$Here, I( . . . ) is a function that outputs 1 when the predicate is trueand outputs 0 when the predicate is false.

Also, P_(G) is the probability distribution based on the words includedin the vocabulary V, and is calculated by the following equations.

$\begin{matrix}{{P_{G}\left( {y_{k}{❘{y_{< k},X}}} \right)}\left\{ \begin{matrix}\frac{\exp\left( {W_{k} \cdot {\psi\left( {s_{k},{y_{k - 1}c_{k}}} \right)}} \right)}{\sum_{k^{\prime} \in V}{\exp\left( {{W_{k}}^{\prime} \cdot {\psi\left( {s_{k},y_{k - 1},c_{k}} \right)}} \right)}} & {{if}y_{k}\epsilon V} \\0 & {otherwise}\end{matrix} \right.} & \left\lbrack {{Math}.11} \right\rbrack\end{matrix}$Here, as the function ψ, a neural network such as a multilayerperceptron can be used. Additionally, W∈R^(2d×V) is a parameter of aneural network to be learned (that is, a neural network functioning as afunction ψ).

Step S212: The answer generation unit 107 judges whether the word y_(k)*(that is, the correct word) corresponding to the output word y_(k) amongthe words included in the correct answer sentence is a special word </s>representing the end of the sentence. When it is judged that the correctword y_(k)* is not the special word </s>, the question answeringapparatus 100 executes processing of step S213. On the other hand, whenit is judged that the correct word y_(k)* is the special word </s>, thequestion answering apparatus 100 executes processing of step S214.

Step S213: The answer generation unit 107 adds 1 to the index k of theoutput word y_(k). Then, the answer generation unit 107 executesprocessing of step S206 by using the added k. Accordingly, theprocessing of step S206 to step S212 is repeatedly executed for each k(k=1, 2, . . . ) until the correct word y_(k)* is the special word </s>.

Step S214: The parameter update unit 109 calculates the loss L relatingto the training data acquired in step S201. The loss L is calculated,using the correct answer sentence included in the training data, aconstant ω, and the probability distribution p representing the scorecalculated by the answer generation unit 107, by the followingequations.

$\begin{matrix}\begin{matrix}{L = {L_{G} + {\omega\left( {L_{H} + L_{U}} \right)}}} \\{L_{G} = {- {\sum\limits_{k}{\ln\left( {p\left( {y_{k}^{*}{❘{y_{< k},X}}} \right)} \right)}}}} \\{L_{H} = {{- {\sum\limits_{k}{\lambda_{k}^{H*}\ln\left( \lambda_{t}^{H} \right)}}} + {\left( {1 - \lambda_{k}^{H*}} \right)\ln\left( {1 - \lambda_{k}^{H}} \right)}}} \\{L_{U} = {{- {\sum\limits_{k}{\lambda_{k}^{U*}\ln\left( \lambda_{t}^{U} \right)}}} + {\left( {1 - \lambda_{k}^{U*}} \right)\ln\left( {1 - \lambda_{k}^{U}} \right)}}}\end{matrix} & \left\lbrack {{Math}.12} \right\rbrack\end{matrix}$Here,

-   -   y*_(k) is the k-th word in the correct answer sentence,    -   λ_(k) ^(H)* is 1 when the k-th word of the correct answer        sentence is included in the document, and is 0 otherwise, and    -   λ_(k) ^(U)* is 1 when the k-th word of the correct answer        sentence is included in the question, and is 0 otherwise.        The constant ω may be set as, for example, ω=1.

Step S215: The input unit 102 judges whether the unacquired trainingdata is in the mini-batch. When it is judged that the unacquiredtraining data is in the mini-batch, the question answering apparatus 100executes the processing of step S201. Accordingly, the processing ofstep S202 to step S214 is executed for each training data included inthe mini-batch. On the other hand, when it is judged that the unacquiredtraining data is not included in the mini-batch (that is, when theprocessing of step S202 to step S214 is executed on all the trainingdata included in the mini-batch), the question answering apparatus 100executes processing of step S216.

Step S216: The parameter update unit 109 calculates an average of eachloss L calculated for each of the training data included in themini-batch, and updates the parameter of the neural network to belearned by using the calculated average of the loss L, for example, bythe stochastic gradient descent method. Note that the stochasticgradient descent method is an example of a parameter optimizationmethod, and an arbitrary optimization method may be used. Accordingly,the parameter of the neural network to be learned is updated using onemini-batch.

Although the output word y_(k) included in the answer sentence is notgenerated in the parameter updating processing described above, theoutput word y_(k) may be generated by a method similar to step S312 inFIG. 7 described below.

Question Answering Processing

Hereinafter, question answering processing executed by the questionanswering apparatus 100 according to the embodiment of the presentdisclosure will be described with reference to FIG. 7 . FIG. 7 is aflowchart illustrating an example of the question answering processingaccording to the embodiment of the present disclosure. As describedabove, the question answering apparatus 100 at the time of the questionanswering includes each functional unit illustrated in FIG. 1 .

Step S301: The input unit 102 inputs a document and a question.

The subsequent step S302 to step S311 are the same as step S202 to stepS211 in FIG. 6 , respectively, and thus the description will be omitted.However, the parameters learned in the learning processing are used asthe parameters of the neural network.

Step S312: The answer generation unit 107 uses the calculated vectoro_(k) in step S311 to select the output word y_(k) from the vocabularyV′.

For example, the answer generation unit 107 selects the wordcorresponding to the element with the maximum score among the elementso_(k,n) of the vector o_(k), from the vocabulary V′, and sets theselected word as the output word y_(k). The word corresponding to theelement o_(k,n) having the maximum score is the word having the maximumprobability p(y_(k)|y_(<k), X) when the word is selected as the outputword y_(k).

The answer generation unit 107, in addition to the above, for example,may select the output word y_(k) by the sampling from the vocabulary V′according to the probability distribution p(y_(k)|y_(<k), X).

Accordingly, by using a learned neural network that identifies whetherthe output word y_(k) is included in the vocabulary B (that is, avocabulary composed of words included in a document and a question), theprobability distribution p(y_(k)|y_(<k), X) corresponding to the outputword y_(k) can be increased or decreased for each k. Accordingly, theword included in the vocabulary B can be accurately selected as theoutput word y_(k) included in the answer sentence.

Step S313: The answer generation unit 107 judges whether the specialword </s> representing the end of the sentence is selected as the outputword y_(k) in step S312 described above. When it is judged that thespecial word </s> is not selected as the output word y_(k), the questionanswering apparatus 100 executes processing of step S314. Accordingly,the processing of step S306 to step S312 is repeatedly executed untilthe special word </s> is selected as the output word y_(k) for each k(k=1, 2, . . . ). On the other hand, when it is judged that the specialword </s> is selected as the output word y_(k), the question answeringapparatus 100 executes processing of step S315.

The subsequent step S314 is the same as step S213 in FIG. 6 ,respectively, and thus the description will be omitted.

Step S315: The answer generation unit 107 generates the answer sentenceincluding the output word y_(k) (k=0, 1, 2, . . . ). Accordingly, theanswer sentence to the document and the question input in step S301 isgenerated.

Step S316: The output unit 108 outputs the answer sentence generated instep S315 to a predetermined output destination.

Summary

As described above, when an arbitrary document and an arbitrary questionto the document is given, the question answering apparatus 100 accordingto the embodiment of the present disclosure even uses the words includedin the given document and question to generate an answer sentence by asentence generation technology using a neural network. Accordingly, inthe question answering apparatus 100 according to the embodiment of thepresent disclosure, for example, a situation where an unknown word isincluded in an answer sentence to the given document and question can besignificantly reduced, and thus question answering with high accuracycan be achieved.

The present disclosure is not limited to the above-described embodimentspecifically disclosed, and various modifications and changes can bemade without departing from the scope of the claims.

REFERENCE SIGNS LIST

-   -   100 Question answering apparatus    -   101 Word vector storage unit    -   102 Input unit    -   103 Word sequence coding unit    -   104 Word sequence matching unit    -   105 Document gaze unit    -   106 Question gaze unit    -   107 Answer generation unit    -   108 Output unit    -   109 Parameter update unit

The invention claimed is:
 1. A computer-implemented method forprocessing a query, the method comprising: receiving a document;receiving a question; receiving a first vocabulary, wherein the firstvocabulary includes a predefined set of words; generating, based onwords in the received document and the received question, a secondvocabulary; generating, using a learnt model based on one or more wordsin a union of the first vocabulary and the second vocabulary, an answersentence, wherein the learnt model comprises a learnt neural networkdetermining whether the second vocabulary includes a word in the answersentence, and wherein the learnt neural network, based on whether thesecond vocabulary includes the word in the answer sentence, determines aprobability of selecting a word from the second vocabulary forgenerating the answer sentence; and providing the generated answersentence in response to the received question.
 2. Thecomputer-implemented method of claim 1, the method further comprising:generating, based on a first sequence of word vectors of words in thedocument and a second sequence of word vectors of words in the question,a matching matrix; generating, based on the matching matrix, a firstgaze of words in the document, wherein the first gaze includes a gazescore of a word of the words in the document; generating, based on thematching matrix, a second gaze of words in the question; and generating,based on a first distribution of the first gaze and a seconddistribution of the second gaze, a distribution of the probability ofselecting the word, and wherein the probability of selecting the wordfrom the second vocabulary for generating the answer sentence relates tothe distribution of the probability of selecting the word.
 3. Thecomputer-implemented method of claim 2, wherein the first gaze of wordsincludes a first gaze distribution of one of the words in the document,and wherein the second gaze of words includes a second gaze distributionof one of the words in the question.
 4. The computer-implemented methodof claim 2, the method further comprising: generating the matchingmatrix using a bidirectional long short-term memory of a recurrentneural network.
 5. The computer-implemented method of claim 1, themethod further comprising: receiving a training document; receiving atraining question; receiving a correct answer sentence for the trainingquestion; generating, using the learnt neural network, a probabilitydistribution, wherein the probability distribution relates to aprobability of words in a union of the predefined first vocabulary and athird vocabulary including a word selected for a candidate answersentence, wherein the third vocabulary includes words in the trainingdocument and the training question, and wherein the candidate answersentence relates to responding to the training question; anddetermining, based on a loss relating to the correct answer sentence andthe generated probability distribution, a parameter; and updating, basedon the determined parameter, the learnt neural network.
 6. Thecomputer-implemented method of claim 5, the method further comprising:receiving a plurality of correct answers, wherein the plurality of thecorrect answer sentences include the correct answer sentence;determining, based on the plurality of correct answer sentences and thegenerated probability distribution, a plurality of losses; determiningan average value of the plurality of losses; and determining, based atleast on the average value of the plurality of losses, the parameter. 7.The computer-implemented method of claim 1, the method furthercomprising: selecting, based on the probability of selecting the wordfrom the second vocabulary, the word from the second vocabulary for theanswer sentence.
 8. A system for processing a question, the systemcomprises: a processor; and a memory storing computer-executableinstructions that when executed by the processor cause the system to:receive a document; receive a question; receive a first vocabulary,wherein the first vocabulary includes a predefined set of words;generate, based on words in the received document and the receivedquestion, a second vocabulary; generate, using a learnt model based onone or more words in a union of the first vocabulary and the secondvocabulary, an answer sentence, wherein the learnt model comprises alearnt neural network determining whether the second vocabulary includesa word in the answer sentence, and wherein the learnt neural network,based on whether the second vocabulary includes the word in the answersentence, determines a probability of selecting a word from the secondvocabulary for generating the answer sentence; and provide the generatedanswer sentence in response to the received question.
 9. The system ofclaim 8, the computer-executable instructions when executed furthercausing the system to: generate, based on a first sequence of wordvectors of words in the document and a second sequence of word vectorsof words in the question, a matching matrix; generate, based on thematching matrix, a first gaze of words in the document, wherein thefirst gaze includes a gaze score of a word of the words in the document;generate, based on the matching matrix, a second gaze of words in thequestion; and generate, based on a first distribution of the first gazeand a second distribution of the second gaze, a distribution of theprobability of selecting the word, and wherein the probability ofselecting the word from the second vocabulary for generating the answersentence relates to the distribution of the probability of selecting theword.
 10. The system of claim 9, wherein the first gaze of wordsincludes a first gaze distribution of one of the words in the document,and wherein the second gaze of words includes a second gaze distributionof one of the words in the question.
 11. The system of claim 9, thecomputer-executable instructions when executed further causing thesystem to: generate the matching matrix using a bidirectional longshort-term memory of a recurrent neural network.
 12. The system of claim8, the computer-executable instructions when executed further causingthe system to: receive a training document; receive a training question;receive a correct answer sentence for the training question; generate,using the learnt neural network, a probability distribution, wherein theprobability distribution relates to a probability of words in a union ofthe predefined first vocabulary and a third vocabulary including a wordselected for a candidate answer sentence, wherein the third vocabularyincludes words in the training document and the training question, andwherein the candidate answer sentence relates to responding to thetraining question; and determine, based on a loss relating to thecorrect answer sentence and the generated probability distribution, aparameter; and update, based on the determined parameter, the learntneural network.
 13. The system of claim 12, the computer-executableinstructions when executed further causing the system to: receive aplurality of correct answers, wherein the plurality of the correctanswer sentences include the correct answer sentence; determine, basedon the plurality of correct answer sentences and the generatedprobability distribution, a plurality of losses; determine an averagevalue of the plurality of losses; and determine, based at least on theaverage value of the plurality of losses, the parameter.
 14. The systemof claim 8, the computer-executable instructions when executed furthercausing the system to: select, based on the probability of selecting theword from the second vocabulary, the word from the second vocabulary forthe answer sentence.
 15. A computer-readable non-transitory recordingmedium storing computer-executable instructions that when executed by aprocessor cause a computer system to: receive a document; receive aquestion; receive a first vocabulary, wherein the first vocabularyincludes a predefined set of words; generate, based on words in thereceived document and the received question, a second vocabulary;generate, using a learnt model based on one or more words in a union ofthe first vocabulary and the second vocabulary, an answer sentence,wherein the learnt model comprises a learnt neural network determiningwhether the second vocabulary includes a word in the answer sentence,and wherein the learnt neural network, based on whether the secondvocabulary includes the word in the answer sentence, determines aprobability of selecting a word from the second vocabulary forgenerating the answer sentence; and provide the generated answersentence in response to the received question.
 16. The computer-readablenon-transitory recording medium of claim 15, the computer-executableinstructions when executed further causing the system to: generate,based on a first sequence of word vectors of words in the document and asecond sequence of word vectors of words in the question, a matchingmatrix; generate, based on the matching matrix, a first gaze of words inthe document, wherein the first gaze includes a gaze score of a word ofthe words in the document; generate, based on the matching matrix, asecond gaze of words in the question; and generate, based on a firstdistribution of the first gaze and a second distribution of the secondgaze, a distribution of the probability of selecting the word, andwherein the probability of selecting the word from the second vocabularyfor generating the answer sentence relates to the distribution of theprobability of selecting the word.
 17. The computer-readablenon-transitory recording medium of claim 16, wherein the first gaze ofwords includes a first gaze distribution of one of the words in thedocument, and wherein the second gaze of words includes a second gazedistribution of one of the words in the question.
 18. Thecomputer-readable non-transitory recording medium of claim 16, thecomputer-executable instructions when executed further causing thesystem to: receive a plurality of correct answers, wherein the pluralityof the correct answer sentences include the correct answer sentence;determine, based on the plurality of correct answer sentences and thegenerated probability distribution, a plurality of losses; determine anaverage value of the plurality of losses; and determine, based at leaston the average value of the plurality of losses, the parameter.
 19. Thecomputer-readable non-transitory recording medium of claim 15, thecomputer-executable instructions when executed further causing thesystem to: receive a training document; receive a training question;receive a correct answer sentence for the training question; generate,using the learnt neural network, a probability distribution, wherein theprobability distribution relates to a probability of words in a union ofthe predefined first vocabulary and a third vocabulary including a wordselected for a candidate answer sentence, wherein the third vocabularyincludes words in the training document and the training question, andwherein the candidate answer sentence relates to responding to thetraining question; and determine, based on a loss relating to thecorrect answer sentence and the generated probability distribution, aparameter; and update, based on the determined parameter, the learntneural network.
 20. The computer-readable non-transitory recordingmedium of claim 15, the computer-executable instructions when executedfurther causing the system to: select, based on the probability ofselecting the word from the second vocabulary, the word from the secondvocabulary for the answer sentence.